73 research outputs found

    Inferring orthologous gene regulatory networks using interspecies data fusion

    Get PDF
    MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase

    Inferring the perturbation time from biological time course data.

    Get PDF
    MOTIVATION: Time course data are often used to study the changes to a biological process after perturbation. Statistical methods have been developed to determine whether such a perturbation induces changes over time, e.g. comparing a perturbed and unperturbed time course dataset to uncover differences. However, existing methods do not provide a principled statistical approach to identify the specific time when the two time course datasets first begin to diverge after a perturbation; we call this the perturbation time. Estimation of the perturbation time for different variables in a biological process allows us to identify the sequence of events following a perturbation and therefore provides valuable insights into likely causal relationships. RESULTS: We propose a Bayesian method to infer the perturbation time given time course data from a wild-type and perturbed system. We use a non-parametric approach based on Gaussian Process regression. We derive a probabilistic model of noise-corrupted and replicated time course data coming from the same profile before the perturbation time and diverging after the perturbation time. The likelihood function can be worked out exactly for this model and the posterior distribution of the perturbation time is obtained by a simple histogram approach, without recourse to complex approximate inference algorithms. We validate the method on simulated data and apply it to study the transcriptional change occurring in Arabidopsis following inoculation with Pseudomonas syringae pv. tomato DC3000 versus the disarmed strain DC3000hrpA AVAILABILITY AND IMPLEMENTATION: : An R package, DEtime, implementing the method is available at https://github.com/ManchesterBioinference/DEtime along with the data and code required to reproduce all the results. CONTACT: [email protected] or [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Modeling meiotic chromosomes indicates a size dependent contribution of telomere clustering and chromosome rigidity to homologue juxtaposition.

    Get PDF
    Meiosis is the cell division that halves the genetic component of diploid cells to form gametes or spores. To achieve this, meiotic cells undergo a radical spatial reorganisation of chromosomes. This reorganisation is a prerequisite for the pairing of parental homologous chromosomes and the reductional division, which halves the number of chromosomes in daughter cells. Of particular note is the change from a centromere clustered layout (Rabl configuration) to a telomere clustered conformation (bouquet stage). The contribution of the bouquet structure to homologous chromosome pairing is uncertain. We have developed a new in silico model to represent the chromosomes of Saccharomyces cerevisiae in space, based on a worm-like chain model constrained by attachment to the nuclear envelope and clustering forces. We have asked how these constraints could influence chromosome layout, with particular regard to the juxtaposition of homologous chromosomes and potential nonallelic, ectopic, interactions. The data support the view that the bouquet may be sufficient to bring short chromosomes together, but the contribution to long chromosomes is less. We also find that persistence length is critical to how much influence the bouquet structure could have, both on pairing of homologues and avoiding contacts with heterologues. This work represents an important development in computer modeling of chromosomes, and suggests new explanations for why elucidating the functional significance of the bouquet by genetics has been so difficult

    CSI : A nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data

    Get PDF
    How an organism responds to the environmental challenges it faces is heavily influenced by its gene regulatory networks (GRNs). Whilst most methods for inferring GRNs from time series mRNA expression data are only able to cope with single time series (or single perturbations with biological replicates), it is becoming increasingly common for several time series to be generated under different experimental conditions. The CSI algorithm (Klemm, 2008) represents one approach to inferring GRNs from multiple time series data, which has previously been shown to perform well on a variety of datasets (Penfold and Wild, 2011). Another challenge in network inference is the identification of condition specific GRNs i.e., identifying how a GRN is rewired under different conditions or different individuals. The Hierarchical Causal Structure Identification (HCSI) algorithm (Penfold et al., 2012) is one approach that allows inference of condition specific networks (Hickman et al., 2013), that has been shown to be more accurate at reconstructing known networks than inference on the individual datasets alone. Here we describe a MATLAB implementation of CSI/HCSI that includes fast approximate solutions to CSI as well as Markov Chain Monte Carlo implementations of both CSI and HCSI, together with a user-friendly GUI, with the intention of making the analysis of networks from multiple perturbed time series datasets more accessible to the wider community.1 The GUI itself guides the user through each stage of the analysis, from loading in the data, to parameter selection and visualisation of networks, and can be launched by typing >> csi into the MATLAB command line. For each step of the analysis, links to documentation and tutorials are available within the GUI, which includes documentation on visualisation and interacting with output file

    Building a stem cell-based primate uterus.

    Get PDF
    Funder: Wellcome TrustThe uterus is the organ for embryo implantation and fetal development. Most current models of the uterus are centred around capturing its function during later stages of pregnancy to increase the survival in pre-term births. However, in vitro models focusing on the uterine tissue itself would allow modelling of pathologies including endometriosis and uterine cancers, and open new avenues to investigate embryo implantation and human development. Motivated by these key questions, we discuss how stem cell-based uteri may be engineered from constituent cell parts, either as advanced self-organising cultures, or by controlled assembly through microfluidic and print-based technologies

    Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks

    Get PDF
    Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. Availability: The methods outlined in this article have been implemented in Matlab and are available on request

    Branch-recombinant Gaussian processes for analysis of perturbations in biological time series.

    Get PDF
    MOTIVATION: A common class of behaviour encountered in the biological sciences involves branching and recombination. During branching, a statistical process bifurcates resulting in two or more potentially correlated processes that may undergo further branching; the contrary is true during recombination, where two or more statistical processes converge. A key objective is to identify the time of this bifurcation (branch or recombination time) from time series measurements, e.g. by comparing a control time series with perturbed time series. Gaussian processes (GPs) represent an ideal framework for such analysis, allowing for nonlinear regression that includes a rigorous treatment of uncertainty. Currently, however, GP models only exist for two-branch systems. Here, we highlight how arbitrarily complex branching processes can be built using the correct composition of covariance functions within a GP framework, thus outlining a general framework for the treatment of branching and recombination in the form of branch-recombinant Gaussian processes (B-RGPs). RESULTS: We first benchmark the performance of B-RGPs compared to a variety of existing regression approaches, and demonstrate robustness to model misspecification. B-RGPs are then used to investigate the branching patterns of Arabidopsis thaliana gene expression following inoculation with the hemibotrophic bacteria, Pseudomonas syringae DC3000, and a disarmed mutant strain, hrpA. By grouping genes according to the number of branches, we could naturally separate out genes involved in basal immune response from those subverted by the virulent strain, and show enrichment for targets of pathogen protein effectors. Finally, we identify two early branching genes WRKY11 and WRKY17, and show that genes that branched at similar times to WRKY11/17 were enriched for W-box binding motifs, and overrepresented for genes differentially expressed in WRKY11/17 knockouts, suggesting that branch time could be used for identifying direct and indirect binding targets of key transcription factors. AVAILABILITY AND IMPLEMENTATION: https://github.com/cap76/BranchingGPs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    High-resolution temporal profiling of transcripts during Arabidopsis leaf senescence reveals a distinct chronology of processes and regulation

    Get PDF
    Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a highresolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence

    Origin and segregation of the human germline

    Get PDF
    Acknowledgements This work was supported by the Wellcome Investigator Awards in Science (2094)75/Z/17/Z (to MA Surani), the Wellcome Investigator Awards in Science 096738/Z/11/Z (to MA Surani), the BBSRC research grant G103986 (to MA Surani), the Croucher Postdoctoral Research Fellowship (to WWC Tang), the Wellcome 4-Yr PhD Programme in Stem Cell Biology & Medicine (2038)31/Z/16/Z (to A Castillo-Venzor) and the Cambridge Commonwealth European and International Trust (to A Castillo-Venzor), the Isaac Newton Trust (to WWC Tang), the Butterfield Awards of Great Britain Sasakawa Foundation (to T Kobayashi and MA Surani), and the Astellas Foundation for Research on Metabolic Disorders (to T Kobayashi). The marmoset embryo research is generously supported by the Wellcome Trust (WT RG89228, WT RG9242), the Centre for Trophoblast Research, the Isaac Newton Trust, and JSPS KAKENHI 15H02360, 19H05759. TE Boroviak was supported by a Wellcome Sir Henry Dale Fellowship. JC Marioni acknowledges core support from EMBL and from Cancer Research UK (C9545/A29580), which supports MD Morgan. We would like to thank Roger Barker and Xiaoling He for providing human embryonic tissues and Charles Bradshaw for bioinformatics support. We also thank The Weizmann Institute of Science for the WIS2 human PSC line and the Genomics Core Facility of CRUK Cambridge Institute for sequencing services. We thank members of the Surani laboratory for insightful comments and critical reading of the manuscript.Peer reviewedPublisher PD
    corecore